Probabilistic Data Integration

نویسندگان

  • Matteo Magnani
  • Danilo Montesi
چکیده

In this paper we propose and experimentally evaluate a data integration approach where the uncertainty generated during the comparison and merging of the input data sources is included into the resulting mediated schema, and can be used to provide richer answers to the users. We describe a system implementing our method, and use it to empirically study the impact of uncertainty management on the effectiveness and efficiency of the data integration process. In particular, we test our approach on benchmark datasets, showing that considering uncertainty we may increase the recall of the method, and on real databases, showing that it can be applied to large data sources. . Department of Computer Science, University of Bologna, Mura A. Zamboni 7, 40127 Bologna, Italy.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Dealing with Uncertainty in Lexical Annotation

We present ALA, a tool for the automatic lexical annotation (i.e. annotation w.r.t. a thesaurus/lexical resource) of structured and semi-structured data sources and the discovery of probabilistic lexical relationships in a data integration environment. ALA performs automatic lexical annotation through the use of probabilistic annotations, i.e. an annotation is associated to a probability value....

متن کامل

Tuple Merging in Probabilistic Databases

Real-world data are often uncertain and incomplete. In probabilistic relational data models uncertainty can be modeled on two levels. First by representing the uncertain instance of a tuple by a set of possible instances and second by assigning each tuple with its degree of membership to the considered relation. To overcome incompleteness, data from multiple sources need to be combined. In orde...

متن کامل

Probabilistic Data Integration Systems

Current data integration techniques are successful at managing well-defined and wellunderstood data integration tasks, but do not cope well with uncertainty. However, the amount of uncertain data is growing with the number and variety of data sources being integrated, both in traditional data integration tasks s.a. enterprise data integration, and in next generation integration problems, s.a. c...

متن کامل

Uncertainty in data integration systems: automatic generation of probabilistic relationships

We propose a method for the automatic discovery of probabilistic relationships in the environment of data integration systems. Dynamic data integration systems extend the architecture of current data integration systems by modeling uncertainty at their core. Our method is a probabilistic word sense disambiguation (PWSD), which allows to automatically lexically annotate (i.e. annotation w.r.t. a...

متن کامل

Integration of Probabilistic Uncertain Information

We study the problem of data integration from sources that contain probabilistic uncertain information. Data is modeled by possible-worlds with probability distribution, compactly represented in the probabilistic relation model. Integration is achieved efficiently using the extended probabilistic relation model. We study the problem of determining the probability distribution of the integration...

متن کامل

An Approach to Probabilistic Data Integration for the Semantic Web

In previous work, we have introduced probabilistic description logic programs for the Semantic Web, which combine description logics, normal programs under the answer set (resp., well-founded) semantics, and probabilistic uncertainty. In this paper, we continue this line of research. We propose an approach to probabilistic data integration for the Semantic Web that is based on probabilistic des...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008